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The kiwifruit (Actinidia chinensis) is an economically and nutritionally important fruit crop 
with remarkably high vitamin C content. Here we report the draft genome sequence of 
a heterozygous kiwifruit, assembled from ~ 140-fold next-generation sequencing data. The 
assembled genome has a total length of 616.1 Mb and contains 39,040 genes. Comparative 
genomic analysis reveals that the kiwifruit has undergone an ancient hexaploidization event 
(y) shared by core eudicots and two more recent whole-genome duplication events. Both 
recent duplication events occurred after the divergence of kiwifruit from tomato and potato 
and have contributed to the neofunctionalization of genes involved in regulating important 
kiwifruit characteristics, such as fruit vitamin C, flavonoid and carotenoid metabolism. As the 
first sequenced species in the Ericales, the kiwifruit genome sequence provides a valuable 
resource not only for biological discovery and crop improvement but also for evolutionary and 
comparative genomics analysis, particularly in the asterid lineage. 
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Table 1 | Kiwifruit genome assembly statistics. 

Contig size Contig Scaffold Scaffold 







number 


size 


number 


N90 


11,574 


11,427 


122,658 


1,053 


N80 


23,123 


7,788 


256,012 


725 


N70 


34,283 


5,660 


376,247 


530 


N60 


45,944 


4,137 


496,035 


387 


N50 


58,864 


2,977 


646,786 


280 


Largest 


423,496 




3,410,229 




Average 


22,612 




80,035 




Total size 


604,217,145 




616,114,069 




Total number 




26,721 




7,698 


(>200bp) 










Total number (>2 kb) 




21,713 




5,106 



Actinidiaceae, the basal family within the Ericales, consists 
of the genera Actinidia y Saurauia and Clematoclethra 1 . 
The genus Actinidia, commonly known as kiwifruit, 
includes several economically important horticultural species, 
such as Actinidia chinensis Planchon, A. deliciosa (A. chinensis 
var. deliciosa A. Chevalier), A. arguta (Siebold and Zuccarini) 
Planchon ex Miquel and A. eriantha Bentham 2 . Approximately 
54 species and 75 taxa have been described in Actinidia 3 , all of 
which are perennial, deciduous and dioecious plants with a 
climbing or straggling growth habit. The kiwifruit species are 
often reticulate polyploids with a base chromosome number of 
jc = 29 (ref. 4). 

The kiwifruit has long been called 'the king of fruits' because of 
its remarkably high vitamin C content and balanced nutritional 
composition of minerals, dietary fibre and other health-beneficial 
metabolites. Extensive studies on the metabolic accumulation of 
vitamin C, carotenoids and flavonoids have been reported in 
kiwifruits 5-13 . The centre of origin of kiwifruit is in the 
mountains and ranges of southwestern China. The kiwifruit has 
a short history of domestication, starting in the early 20th century 
when its seeds were introduced into New Zealand 14 . Through 
decades of domestication and substantial efforts for selection 
from wild kiwifruits, numerous varieties have been developed and 
kiwifruits have become an important fresh fruit worldwide with 
an annual production of 1.44 million tons in 2011 (http:// 
faostat.fao.org). 

Despite the availability of an extensive expressed sequenced tag 
(EST) database 15 and several genetic maps 16,17 , whole-genome 
sequence resources for the kiwifruit, which are critical for its 
breeding and improvement, are very limited. Kiwifruit belongs to 
the order Ericales in the asterid lineage. Currently, no genomes 
have been sequenced for species in the Ericales and in the asterid 
lineage only the genomes of Solanaceae species in the order 
Euasterids I, including the tomato 18 and potato 19 , have been 
sequenced. 

Here we sequence and analyse the genome of a heterozygous 
kiwifruit, 'Hongyang' (A. chinensis), which is widely grown in 
China. The availability of this genome sequence not only provides 
insight into the underlying molecular basis of specific agronomi- 
cally important traits of kiwifruit and its wild relatives but 
also presents a valuable resource for elucidating evolutionary 
processes in the asterid lineage. 

Results 

Genome sequencing and assembly. One female individual of a 
Chinese kiwifruit cultivar 'Hongyang' was selected for whole- 
genome sequencing. 'Hongyang' is a heterozygous diploid 
(2n = 2;*; = 58) that is derived from clonally selected wild germ- 
plasm in central China and has not been subjected to further 
selection and breeding 20 . Its oval shaped fruit has a hairy, 
greenish-brown skin, a slight green or golden outer pericarp and a 
red-flesh inner pericarp with rows of tiny, black, edible seeds. Its 
fruit is highly nutritional containing abundant levels of ascorbic 
acid (vitamin C), carotenoids, flavonoids and anthocyanins 
(Supplementary Table SI). 

A total of 105.8 Gb high-quality sequences (Supplementary 
Table S2) were generated using the Illumina HiSeq 2000 system. 
This represented approximately a 140 x coverage of the kiwifruit 
genome with an estimated size of 758 Mb based on the flow 
cytometry analysis 21 . De novo assembly of these sequences 
employing Allpaths-LG 22 yielded a draft genome of 616.1Mb, 
representing 81.3% of the kiwifruit genome (Table 1). The 
genome assembly consists of 21,713 contigs and 5,110 scaffolds 
(>2kb), with N50 sizes of 58.8 and 646.8 kb for contigs and 
scaffolds, respectively (Table 1). To determine scaffold placement 



on kiwifruit pseudochromosomes, a high- density genetic map 
was constructed using an Fl population derived from the cross 
between ' Hongyang- MS-01' (male) and A. eriantha 'Jiangshan- 
jiao' (female). Genotyping of each individual in the Fl population 
was determined using SLAF-seq 23 . The final map spanned 
5,504.5 cM across 29 linkage groups and was composed of 
4,301 single nucleotide polymorphism (SNP) markers, with a 
mean marker density of 1.28 cM per marker. Using 3,379 markers 
that were uniquely aligned to the assembled scaffolds, a total of 
853 scaffolds were anchored to the 29 kiwifruit pseudo- 
chromosomes, comprising 73.4% (452.4 Mb) of the kiwifruit 
genome assembly (Fig. 1). Of the 853 anchored scaffolds, 491 
could be oriented (333.6 Mb, 73.7% of the anchored sequences). 

The GC content of the assembled genome was 35.2%, similar to 
that of the genomes of tomato (34%) 18 and potato (34.8%) 19 , 
which to date are the evolutionarily closest species of kiwifruit 
that have genomes sequenced (Supplementary Fig. SI). Further- 
more, we detected heterozygous sites by mapping the reads back 
to the assembled genome, revealing a high level of heterozygosity 
(0.536%) in 'Hongyang', which was further supported by the 
K-mer distribution of the genomic reads (Supplementary Fig. S2). 

To evaluate the quality of the assembled genome, an 
independent Illumina library with an insert size of 500 bp was 
constructed and sequenced. The resulting reads were mapped to 
the assembled genome to identify homozygous SNPs and 
structure variations (SVs), which represent potential base errors 
and misassemblies in the genome, respectively. The analyses 
indicated that the assembly has a single base error rate of 0.03%, 
which is comparable to the rate of the tomato genome (0.02%) 18 . 
In addition, only 24 SVs were identified (Supplementary 
Table S3), indicating a very low frequency of misassemblies in 
the genome. The quality of the assembly was further assessed by 
aligning the EST sequences from the genus Actinidia 15 to the 
assembled genome. The analysis indicated that the assembly 
contained 97.3% of the 81,956 ESTs derived from A. chinensis, 
90.9% of the 83,924 ESTs from A. deliciosa and 94.3% of the 
19,574 ESTs from A. eriantha (Supplementary Table S4). 
Together, these analyses supported the high quality of our 
genome assembly. 

Repetitive sequence annotation. We identified a total of 
~ 222 Mb (36% of the assembly) of repetitive sequences in the 
kiwifruit genome. The content of repetitive sequences in the 
kiwifruit genome appears to be much less than that in tomato 
(63.2%) 18 and potato (62.2%) 19 , whereas it is more than that in 
Arabidopsis (14%) 24 and Thellungiella parvula (7.5%) 25 . 

Comparative analysis with known repeats in Repbase 26 and 
plant repeat database 27 indicated that 68.8% of the repetitive 
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Figure 1 | Anchoring the Hongyang genome assembly to the diploid kiwifruit reference genetic map. 'Hongyang' (A chinensis) genome scaffolds (blue) 
were anchored to the linkage groups (yellow) of the A. chinensis x A. eriantha genetic map with 3,379 SNP markers. 



sequences in the kiwifruit genome could be classified and 
annotated. A large portion of the unclassified repetitive 
sequences might be kiwifruit-specific. Retrotransposons made 
up the majority of the repeats, among which the long terminal 
repeat (LTR) family was the most abundant (~13.4% of the 
assembly). Within the LTR family, Copia and Gypsy represented 
the two most abundant subfamilies. In addition, DNA 
transposons accounted for ~4.75% of the genome assembly 
(Supplementary Table S5). 



Gene prediction and annotation. Using the EST sequences of 
A. chinensis 15 and RNA-seq data we have generated from 
A. chinensis leaf and fruits (Supplementary Fig. S3), integrated 
with ab initio gene predictions and homologous sequence 
searching, we predicted a total of 39,040 protein-encoding genes 
with an average coding sequence length of 1,073 bp and 4.6 exons 
per gene. Among these genes, 74.5 and 82.3% had significant 
similarities to sequences in the non-redundant nucleotide and 
protein databases in NCBI, respectively. Additionally, 37.4, 66.9, 
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81.9, 61.3 and 81.8% could be annotated using COG, GO, 
TrEMBL, Swissprot and KEGG databases, respectively. Further- 
more, conserved domains in >65.5% of the predicted protein 
sequences could be identified by comparing them against Inter- 
Pro and Pfam databases. In addition, a total of 2,438 putative 
transcription factors that are distributed in 58 families and 447 
transcriptional regulators distributed in 22 families were identi- 
fied in the kiwifruit genome (Supplementary Data 1 and 2). In 
addition to protein-coding genes, 293 rRNAs, 511 tRNAs, 236 
miRNAs, 91 snRNAs and 307 SnoRNAs were also identified. 

Comparative analyses between kiwifruit and other plants. 

Comparative analyses of the complete gene sets of kiwifruit, 
Arabidopsis, rice, grape and tomato were performed. A total of 
25,381 genes in the kiwifruit genome were assigned into 13,100 
orthologous gene clusters. Among these clusters, 7,985 are com- 
mon to all five species, whereas 885 are confined to eudicots 
(kiwifruit, Arabidopsis, grape and tomato). Within the eudicots, 
337 gene clusters are restricted to plants with flesh fruits (kiwi- 
fruit, grape and tomato), whereas 1,455 clusters contain genes 
only from kiwifruit (Fig. 2). Further functional characterization 
based on GO terms revealed that the 337 flesh fruit- specific 
families were highly enriched with genes associated with fruit 
quality, including those related to flavonoid, phenylpropanoid, 
anthocyanin and oligosaccharide metabolism (Supplementary 
Table S6). The kiwifruit- specific families were significantly enri- 
ched with genes related to pollen tube reception and specification 
of floral organ identity (Supplementary Table S7), both of which 
are consistent with the high diversity of sex expression found in 
kiwifruit 17 ' 28 . 

Among plants with the sequenced genomes, tomato has the 
closest evolutionary relationship to kiwifruit. Consequently, the 
largest number of gene clusters (10,849) were shared between 
kiwifruit and tomato, representing 82.8 and 82.5% of their 
individual total gene clusters, respectively. We then calculated the 
evolutionary rate for each of the orthologous gene pairs of 
kiwifruit-grape, Arabidopsis-grape and tomato-grape. The 
average ratio (co) of non- synonymous (Ka) versus synonymous 
(Ks) nucleotide substitution rate in kiwifruit (0.064) was found to 
be greater than that in Arabidopsis (0.055) and tomato (0.052), 



indicating that diversifying selection may have been stronger in 
kiwifruit. 

Whole-genome duplication in kiwifruit. Whole-genome dupli- 
cation (WGD) followed by gene loss has been found in most 
eudicots and is regarded as the major evolutionary force that gives 
rise to gene neofunctionalization in both plants and animals. 
Within the kiwifruit genome, 588 paralogous relationships were 
identified, covering 46% of the genome. We then compared the 
kiwifruit genome sequence to that of tomato, potato and grape, 
respectively, and identified a large number of syntenic regions 
(Fig. 3a). The distribution of 4DTv (transversions at fourfold 
degenerate sites) and Ks values of homologous pairs in these 
syntenic regions, as well as the mean Ks values of individual 
syntenic blocks indicated that an ancient WGD (the y event), 
which is shared by core eudicots, and two recent WGD events 
had occurred in the evolutionary history of kiwifruit (Fig. 3b and 
Supplementary Fig. S4a-c). In addition, using the method 
described in Simillion et al. 29 , we were able to group kiwifruit 
syntenic blocks into three age classes based on their mean Ks 
values, further supporting the ancient triplication and the two 
recent WGD events in kiwifruit (Supplementary Fig. S4d). The 
two recent WGD events, Ad-a and Ad-fi, were estimated to have 
occurred ~26.7 and 72.9-101 .4 million years ago, respectively, 
based on Ks of paralogous genes. These results are consistent with 
previous findings based on the EST analysis 30 . Both Ad-a and 
Ad-$ events occurred after the kiwifruit-tomato or kiwifruit- 
potato divergence (Fig. 3b). 

The relationship of orthologous genes in syntenic blocks 
between kiwifruit and grape was further analysed. We found that 
55.8% of kiwifruit gene models are in blocks that are orthologous 
to one grape region, collectively covering 73.6% of the grape gene 
space. Among these grape genomic regions, 19.1% have one 
orthologous region in kiwifruit, 20.5% have two, 23% have three, 
19.3% have four, 11% have five, 4.4% have six, 2.2% have seven 
and a few ( < 1%) have eight, nine and ten. This pattern is similar 
to that of Arabidopsis 31 , whose genome has also undergone two 
WGD (At-oc and At- (3) following the ancient y triplication. These 
data further supported the occurrence of the two recent WGD 
events in kiwifruit, followed by extensive gene loss. 
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Figure 2 | Venn diagram of orthologous gene families. Five species 
(kiwifruit, Arabidopsis, grape, tomato and rice) were used to generate the 
Venn diagram based on the gene family cluster analysis. 



Gene expansion and neofunctionalization in kiwifruit. Kiwi- 
fruit is well known for its high nutritional value because of the 
extremely abundant content of ascorbic acid (vitamin C) in them. 
We investigated and compared genes involved in the ascorbic 
acid biosynthesis and recycling pathway in kiwifruit, Arabidopsis, 
grape, sweet orange and tomato. Although we found no expan- 
sion in genes from the L-galactose pathway that forms the major 
route to vitamin C biosynthesis in kiwifruit 32 , we did find that 
other gene families involved in ascorbic acid biosynthesis, 
including Alase (aldonolactonase), APX (L-ascorbate 
peroxidase) and MIOX (myo -inositol oxygenase), and genes 
responsible for ascorbic acid regeneration from its oxidized 
forms, including MDHAR (monohydroascorbate reductase), 
exhibited an expansion in kiwifruit (Supplementary Table S8 
and Supplementary Fig. S5). Phylogenetic analyses of genes in 
these expanded families, combined with results from the synteny 
analyses, indicated that the two recent WGDs in kiwifruit resulted 
in additional gene family members that evolved to contribute 
to the high vitamin C accumulation in the fruit of kiwifruit 
(Fig. 4 and Supplementary Figs S6-S8). Most of the expanded 
genes in the ascorbic acid biosynthesis and recycling pathway 
were expressed in both leaves and fruits of kiwifruit, with a large 
portion being expressed higher in fruits (especially immature 
fruits) than in leaves (Supplementary Table S9). This is consistent 
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Figure 3 | Comparative analysis and duplication events in the kiwifruit genome, (a) Syntenic blocks between genomes of kiwifruit, tomato and grape, (b) 
Whole-genome duplications in kiwifruit as revealed by the distribution of 4DTv distance between syntenically orthologous genes. 



with the high level of ascorbic acid in both fruits and leaves of 
'Hongyang' and a higher level in immature fruits (Supplementary 
Table SI). 

Kiwifruit also contains high levels of other important 
nutritional compounds including carotenoids, flavonoids and 
chlorophylls (Supplementary Table SI). Compared with Arabi- 
dopsis, grape, sweet orange and tomato, expansions of gene 
families in the carotenoid biosynthesis pathway, including 
lycopene beta-cyclase, lycopene epsilon-cyclase, phytoene desa- 
turase and violaxanthin deepoxidase, were observed in kiwifruit 
(Supplementary Table S10 and Supplementary Fig. S9). In the 
flavonoid biosynthesis pathway, gene families including chalcone 
isomerase, flavanone 3 -hydroxylase and flavonoid 3' -hydroxylase, 
were expanded (Supplementary Table Sll and Supplementary 
Fig. S10). Although no large-scale gene expansion events in 
chlorophyll biosynthesis and degradation are evident in kiwifruit, 
an additional member encoding GluTR ( Glutamyl- tRN A reduc- 
tase) was found (Supplementary Table SI 2). GluTR is responsible 
for the biosynthesis of 5 -aminolevulinic acid, the key precursor of 
chlorophyll 3 ' 34 . Expression analyses indicated that almost all of 
the identified expanded genes in the carotenoid, flavonoid and 
chlorophyll metabolic pathways were expressed, exhibiting 
temporal and tissue specificity (Supplementary Tables S13-S15). 

It is worth noting that unlike most other kiwifruit cultivars, 
'Hongyang' fruits and leaves are also highly abundant in 
anthocyanins, one class of flavonoid compounds and responsible 
for the red colour of the inner pericarp of 'Hongyang' 
(Supplementary Table SI). It is not surprising that there is no 
expansion observed in kiwifruit for the key enzymes of 
anthocyanin biosynthesis including leucoanthocyanidin dioxy- 
genase/anthocyanidin synthase and UDP-glucose flavonoid 
3-O-glucosyltransferase (Supplementary Table Sll). Most genes 
in the leucoanthocyanidin dioxygenase/anthocyanidin synthase 
and UDP-glucose flavonoid 3-O-glucosyltransferase families were 



highly expressed in both immature fruits and leaves, indicating 
that anthocyanins might be mainly synthesized in the early 
development stage of fruits (Supplementary Table SI 4). 

Disease-resistance genes in kiwifruit. Plants have evolved two 
layers of innate immunity to defend potential pathogens: patho- 
gen-associated molecular pattern-triggered immunity (PTI) and 
effector- triggered immunity. PTI is a relatively ancient form that 
is triggered through the perception of pathogen-associated 
molecular patterns by pattern-recognition receptors, whereas 
effector- triggered immunity is conferred through the recognition 
of pathogen-secreted effectors by the nucleotide-binding site and 
leucine-rich repeat (NBS-LRR) genes 35 . In the kiwifruit genome, 
a total of 96 NBS-LRR genes were identified (Supplementary 
Data 3), which was comparable to the number of NBS-LRR genes 
found in papaya (55) 36 and watermelon (44) 37 but considerably 
fewer than those in Arabidopsis (166) 38 , rice (~600) 39 , grape 
(504) 40 and tomato (25 1) 18 . As particular NBS-LRR genes 
recognize specific pathogen effectors, the fewer number of NBS- 
LRR genes in kiwifruit may represent less potential for pathogen 
recognition. These data imply that NBS-LRR genes are not under 
strong selection pressure in kiwifruit, possibly because of fewer 
pathogens that have evolved to adapt to kiwifruit. The distribu- 
tion of NBS-LRR genes in the kiwifruit genome is not random, as 
nearly one-third of them (27) are located in the genome within 10 
clusters (Supplementary Data 3), suggesting that they have 
evolved mainly through tandem duplications, similar to what 
have been reported in other sequenced plant genomes 37-40 . 

A total of 261 putative pattern-recognition receptor genes, 
which encode receptor-like kinases with an LRR domain (RLK- 
LRR), were identified in the kiwifruit genome (Supplementary 
Data 4). This number is larger than that found in Arabidopsis 
(220), grape (232) and tomato (236), suggesting that PTI, a type 
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Figure 4 | Phylogenetic and syntenic analyses of monohydroascorbate reductase genes, (a) Phylogenetic tree of monohydroascorbate reductase 
(MDHAR) genes from kiwifruit (green), Arabidopsis (blue), grape (purple), sweet orange (yellow) and tomato (red). Heatmaps representing expression 
levels of kiwifruit MDHAR genes in leaf, immature fruit, mature green fruit and ripe fruit (from left to right) are shown on the right of the tree. Log2- 
transformed gene expression values were used to generate heatmaps. (b) Microsynteny of genome regions surrounding kiwifruit MDHAR genes 
(Achn297231 and Achn389481), and their corresponding tomato (Solyc02g086710), potato (PGSC0003DMG400000486) and grape 
(GSVIVT01032453001) orthologues. MDHAR genes are shown in red. Genes with no syntenic homologue(s) are in white. Other syntenic genes are shown 
in different colours based on their presence/absence patterns in the genome regions. 



of ancient innate immunity, is more conserved in kiwifruit and 
may have an important role in defense against potential 
pathogens. 

Discussion 

A high-quality draft genome of a highly heterozygous kiwifruit 
cultivar 'Hongyang' has been successfully assembled, using the 
high coverage (~ 140 x ) of Illumina paired-end and mate-pair 
reads. The assembly covers about 81.3% of the kiwifruit genome. 
The kiwifruit genome sequence presented in this study represents 
the first genome sequence of a member in the order Ericales 
and the third in the entire asterid lineage, after potato and 
tomato, thus providing a valuable resource for comparative 
genomics and evolutionary studies, especially in the asterid 
lineage that has much less available genomic resources compared 
with the rosid lineage. 

Besides the ancient hexaploidization event (y) shared by core 
eudicots, kiwifruit appears to have undergone two additional 
independent WGD events. Both events occurred after the 
divergence of kiwifruit from tomato and potato. Whether these 
two WGD events are shared by other members of Actinidiaceae 
or the Ericales will require further analysis when more genome 
sequences from the family or order become available. 

6 



Kiwifruit is a rich source of ascorbic acid (vitamin C) and other 
health-beneficial compounds. Our analyses demonstrated that 
extensive expansions have occurred in the members of gene 
families involved in the ascorbic acid biosynthetic and recycling 
pathway, the carotenoid biosynthesis pathway and the flavonoid 
metabolism pathway. The majority of these expansions can be 
attributed to at least one of the two recent WGD events, 
indicating that WGD has played an important role in adding new 
gene family members that mediate important fruit- specific 
attributes that contribute to the high nutritional value of kiwifruit. 

The kiwifruit genome sequence will be an invaluable resource 
for the genetic improvement of kiwifruit and for better under- 
standing of genome evolution. It will also be invaluable in 
developing new varieties and for resolving questions of 
agronomical and/or biological importance, such as those related 
to fruit development and ripening, fruit nutrient metabolism, 
disease resistance, sex determination and polyploidy evolution in 
kiwifruit and other related plant species. 

Methods 

Genome sequencing and assembly. High-quality genomic DNA was extracted 
from young leaves of a 5-year old, female plant of A. chinensis cv. Hongyang, 
growing in the farm of Sichuan Academy of Natural Resource Sciences, Sichuan 
Province, China. An improved CTAB method was used to prepare the kiwifruit 
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genomic DNA. The modified CTAB extraction buffer included 0.1 M Tris-HCl, 
0.02 M EDTA, 1.4 M NaCl, 3% (w/v) CTAB and 5% (w/v) PVP K40. Beta-mer- 
captoethanol was added to the CTAB extraction buffer to ensure DNA integrity 
and quality. RNase A and proteinase K were used to remove RNA and protein 
contamination, respectively. Paired-end and mate-pair Illumina genomic DNA 
libraries were constructed following the manufacturer's instructions (Illumina, 
USA) using the prepared DNA from 'Hongyang'. The libraries were sequenced on 
an Illumina HiSeq 2000 system. Raw reads were processed by removing PCR 
duplicates, low-quality reads, adaptor sequences and contaminated reads of bac- 
terial or viral origin. Additionally, sequence errors were corrected based on the 
K-mer frequency. The resulting high-quality cleaned reads were assembled into 
contigs and scaffolds using Allpaths-LG 22 , based on the 'de bruijn graph' theory. 
Gaps within the scaffolds were filled using the GapCloser tool in the SOAPdenovo 
package 41 . 

Genetic map construction and scaffold anchoring. An Fl population was 
generated from a cross between A. chinensis 'Hongyang-MS-01' (male) and 
A. eriantha 'Jiangshanjiao' (female). 'Hongyang-MS-01' is a male variety and is 
normally used as the pollinizer for the female cultivar 'Hongyang'. Owing to the 
high heterozygosity in both parents, the Fl population could be considered as a 
double pseudo-testcross. A total of 300 individuals were obtained and 108 of them 
were used for genetic map construction. 

High-quality genomic DNA was extracted from the 108 individuals and was 
used to construct Illumina sequencing libraries following the manufacturer's 
protocol (Illumina, USA). The genomic libraries were sequenced on an Illumina 
HiSeq 2000 system, and a total of 3.76 Gb of 80 bp reads were obtained. Average 
sequencing depths of sequenced loci per parent and per progeny were 78.63- and 
10.73 -fold, respectively. Genotyping and evaluation of the quality of genetic 
markers were performed as described in Sun et al. 23 Briefly, the Illumina paired- 
end reads were first clustered into the same groups if they share at least 90% 
sequence identities. Alleles at each locus were then defined using the minimum 
allele frequency evaluation, and markers with more than four alleles were 
discarded. The accuracy of the genotyping was evaluated based on the coverage 
of each allele and the number of single-nucleotide polymorphisms at each locus. 
A total of 5,320 high-quality SNP markers were identified and used to construct 
the genetic map with the double pseudo-testcross strategy using JoinMap 4.0. 
Using the nearest-neighbour method, a total of 4,301 markers were clustered into 
29 linkage groups, with LOD scores ranging from 4 to 20. The SNP markers were 
then aligned to the kiwifruit-assembled scaffolds, and only uniquely aligned 
markers were used to anchor and orient the scaffolds onto the 29 kiwifruit 
pseudochromosomes. The marker sequences are provided in Supplementary 
Data 5. 

Quality assessment and heterozygous locus identification. An independent 
library with an insert size of 500 bp was constructed and sequenced to assess the 
quality of the genome assembly. Paired-end reads from this library were aligned to 
the kiwifruit genome assembly using BWA 42 . On the basis of the alignments, 
genotypes supported by at least 10 reads and with an allele frequency >0.3 were 
assigned to each genomic position. Homozygous SNPs, which represent potential 
sequence errors in the assembly, were then identified. SVs, which represent 
potential misassemblies, were also identified using SVDetect 43 . To determine the 
coverage of gene space by the assembly, EST sequences of A. chinensis, A. deliciosa 
and A. eriantha 15 were mapped to the genome assembly using BLASTN. 

To identify heterozygous sites, reads from all four short-insert (insert size 
< 500 bp) Illumina paired-end libraries were aligned to the kiwifruit genome 
sequences using BWA 42 . Only read pairs that were uniquely aligned to the genome 
were kept. Following alignment, the coverage of each genomic position by bases A, 
G, C and T was calculated. Genomic loci containing at least two alleles with each of 
them supported by at least 20 reads and with allele frequency of at least 0.3 were 
identified as heterozygous loci in the kiwifruit genome. Adjacent heterozygous loci 
that were separated by < 5 bp were discarded. 

Identification of repetitive sequence. Repeat sequences in the kiwifruit genome 
were first identified using three de novo prediction programs, LTR_FINDER 44 , 
RepeatScout 45 and PILER-DF 46 . The identified repeat sequences were used to 
construct a non-redundant repeat sequence library. Repeat sequences in the 
kiwifruit genome were then identified using Repeatmasker (http:// 
www.repeatmasker.org) and the constructed repeat sequence library. Additional 
repeat sequences were identified by comparing the assembled genome sequences 
against the Repbase database 26 and the plant repeat database 27 , using BLAST with 
an e-value cutoff of le-5. Finally, repeat sequences with an identity >50% were 
grouped into the same classes. 

RNA-seq data generation and expression analysis. Mature leaves, immature 
fruits (20 days after pollination (DAP)), mature green fruits (120 DAP) and ripe 
fruits (127 DAP) were collected from a 5-year-old 'Hongyang' plant. Fresh tissues 
were immediately frozen in liquid nitrogen and ground to fine powder. Total RNAs 
were isolated using the Trizol reagent (Invitrogen, USA) followed by treatment 
with RNase-free DNase I (Promega, USA) according to the manufacturers' 



protocols. The quality of RNAs was checked using an Agilent 2100 Bioanalyzer. 
Illumina RNA-Seq libraries were prepared and sequenced on a HiSeq 2000 system 
following the manufacturer's instructions (Illumina, USA). Two to three biological 
replicates were performed for the fruit samples. The resulting paired-end 90-bp- or 
single-end 50-bp RNA-seq reads were first aligned to ribosomal RNA and tRNA 
sequences in order to remove possible contaminations of these sequences. The 
cleaned reads were then aligned to the kiwifruit genome assembly using TopHat 47 . 
Following the alignment, raw counts for each kiwifruit gene model were derived 
and normalized to fragments per kilobase of exon model per million mapped 
reads 48 . 

Gene prediction and annotation. The repeat-masked kiwifruit genome sequences 
were used for gene prediction. The gene prediction pipeline combined ab initio 
gene predictions, homologous sequence searching and transcriptome sequence 
mapping (including both ESTs and RNA-seq data). The results from the three 
independent methods were merged into the final consensus of gene models using 
Glean 49 . Specifically, GeneScan 50 was employed for de novo gene prediction. 
Homologous sequence searching was performed by comparing protein sequences 
of Arabidopsis, rice, grape and tomato against the repeat-masked kiwifruit genome 
sequences using TBLASTN with parameters of identity > 60 and > 80% of the 
query sequence covered in the alignments. The corresponding kiwifruit genomic 
regions were retrieved, together with sequences 1 kb downstream and upstream of 
the aligned regions. The alignments were further processed using GeneWise 51 to 
extract accurate exon-intron information. Illumina RNA-seq reads were assembled 
de novo into contigs using Trinity 48 . The resulting contig sequences, as well as 
A. chinensis EST sequences, were aligned to the repeat-masked kiwifruit genome 
sequences using BLAT 52 . Final gene sequences were derived through further 
analysis of the BLAT alignment results using PAS A 53 . 

Annotation of the predicted genes was performed by blasting their sequences 
against a number of nucleotide and protein sequence databases, including COG, 
InterPro, nt, nr, KEGG, Swiss-Prot and TrEMBL, using an e-value cutoff of le-5. 
Functions of the predicted kiwifruit genes were assigned using AHRD (Automated 
assignment of Human Readable Descriptions; https://github.com/groupschoof/ 
AHRD) as described previously 18 . Briefly, the 200 top-scoring search results of 
kiwifruit-predicted proteins against Swiss-Prot, TrEMBL, InterPro and Arabidopsis 
protein databases were scored based on alignment scores, expected quality of 
descriptions per database and a lexical scoring of individual 'words' computed from 
their frequency in the descriptions of top-scoring results. The highest- scoring 
description was assigned to each kiwifruit gene. GO terms were assigned to the 
annotated genes in kiwifruit and other sequenced plant species, including tomato, 
grape, apple and strawberry, using the Blast2GO pipeline 54 . tRNAs were identified 
using tRNAscan-SE 55 , and snRNAs and snoRNAs were identified by searching the 
genome assembly against the Rfam database using INFERNAL with default 
parameters (http://infernal.janelia.org/). rRNAs were identified by searching the 
genome assembly using the Rfam database as a reference with a cutoff of at least 
90% sequence identity and 80% coverage. Transcription factors/regulators were 
identified and classified into different families using the iTAK pipeline (http:// 
bioinfo .bti. Cornell, edu/ tool/ itak) . 

Comparative analysis of gene sets. Protein sequences from kiwifruit, Arabi- 
dopsis, rice, grape and tomato were used to identify gene clusters. For those having 
spliced variants, only the variants with the longest protein sequences were used. 
Gene family clusters among different plant species were identified using 
OrthoMCL 1.4 (ref. 56). Pairwise sequence similarities between all protein 
sequences were calculated using BLASTP with an e-value cutoff of le-05. On the 
basis of the results of BLASTP, OrthoMCL was used to perform a Markov 
clustering algorithm to define the cluster structure, with a default inflation value 
(-1) of 1.5. 

KaKs_Calculator 57 was utilized to calculate the value under the evolution press. 
The phylogenetic trees were constructed using MEGA 5.0 using the neighbour- 
joining or maximum likelihood method 58 . The parameters used in the tree 
construction were the JTT model plus gamma- distributed rates and 1,000 
bootstraps. 

Comparative genomics analysis. All-to-all BLASTP analysis of protein sequences 
was performed between kiwifruit and grape, tomato, and potato, respectively, as 
well as within each species, using an e- value cutoff of le-10, coverage >50% and 
identity > 20%. Syntenic regions within each species and between kiwifruit and 
grape, tomato and potato were then identified using MCscan 31 based on the all-to- 
all BLASTP results. Protein sequences of homologous gene pairs in the identified 
syntenic regions were first aligned using MUSCLE 59 , and the protein alignments 
were then converted to the CDS alignments. Finally, 4DTV values were calculated 
on these CDS alignments and corrected using the HKY model, and Ks values were 
calculated using the YnOO program in the PAML package 60 . The kiwifruit syntenic 
blocks were grouped into different age classes based on their mean Ks values using 
the method described in Simillion et al. 29 Briefly, two duplication blocks were put 
into the same age class if the mean Ks values of both duplications did not differ 
significantly using a f-test (P<0.01). A candidate age class was formed by taking a 
first duplication and adding to it the duplication that resulted in the age class with 
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the lowest coefficient of variance. This process continued until no further 
duplications could be added to the age class without exceeding a coefficient of 
variance value of 0.5. Next, a second candidate age class was formed by starting 
with a second duplication and repeating the process. These steps were repeated for 
the remaining duplications until no more age classes could be defined containing 
five or more blocks. 



Nutrient metabolite determinations. The content of various metabolites, 
including starch, sugar, chlorophyll, flavonoids, anthocyanins, carotenoids and 
ascorbic acid (vitamin C), was determined in 'Hongyang' leaves and fruits 
(including both inner and outer pericarps) at 20 DAP, 120 DAP and 127 DAP. 
Three biological replicates were performed for each analysis. Starch content was 
measured using the starch iodine reaction 61 . Briefly, samples extracted with 18% 
HC1 were stained with the I 2 -KI solution. After mixing, absorbance at 605 and 
530 nM was measured. The starch content was estimated according to the standard 
curve generated by the two standards of amylose and amylopectin mixed in various 
ratios. Sugar content was determined using the anthrone method 62 , with glucose as 
the standard. Flavonoids were extracted in 95% ethanol by ultrasonic methods 63 . 
The level of flavonoids was measured using the aluminium chloride colorimetric 
method 64 , with rutin as the standard. Total anthocyanin content was determined 
by the method described in Di Stefano et al. 65 Briefly, 1-g frozen sample was 
macerated twice to colourless with 25 ml extracting solvent (95% ethanol-0.1 N 
HC1) at 50 °C. Ten millilitre of filtered extract was diluted to 50 ml with pH = 1.0 
(0.2 moll- 1 KCL-O^moll- 1 HC1-H20) and pH = 4.5 (19.294 g NaAc and 
24 ml glacial acetic acid make up to 500 ml with ultrapure water) buffer solution, 
respectively. The diluted extract was stored in the dark for 100 min, and the 
optical density (OD) was measured at the absorption maxima of anthocyanins 
extract (530 nm). The anthocyanin content was calculated as X= (AOD x V x 
Fx Mx l,000)/(e x m). Where X = total anthocyanin content (mgg - l ); 
OD = absorbancy reading on the diluted sample (1 cm cell); V= diluted volume 
(ml); F = dilution factor; M = molecular weight of cyanidin-3-O-glucoside 
(449.38); e = molar extinction coefficient of cyanidin-3-O-glucoside (2.69 x 104); 
and m = sample weight. The content of ascorbic acid (vitamin C) was measured 
using the dinitrophenylhydrazine method 66 . Chlorophyll and carotenoids were 
repeatedly extracted in 80% aqueous acetone (10 ml) in darkness until the samples 
turned white. Absorbance was measured at 470, 663 and 645 nm, respectively. The 
relative content of chlorophyll a, chlorophyll b, total chlorophyll and total 
carotenoids was then calculated using formulae described in Lichtenthaler et al. 67 
and Arnon 68 : total carotenoids (mgg - l ) = (1,000 x A470-3.27 x chlorophyll 
a - 104 x chlorophyll b)/229, chlorophyll a (mgg - x ) = (12.7 x A663-2.69 x 
A645) x V7(l,000 x W), chlorophyll b (mgg - l ) = (22.9 x A645-4.68 x A663) x 
V7(l,000 x W), total chlorophyll (mgg - l ) = (8.02 x A663 + 20.20 x A645) x VI 
(1,000 x W), where V= volume of the extract (ml) and W= weight of fresh 
tissues (g). 
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